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Tool 


Version 


Main Parameters 


BWA1 


0.6.2 




BWA2 


0.6.2 




Bowtiel 


0.12.9 


— fullrcf -sam -q —best —strata -k 10 


Bowtic2 


2.0.5 


-end-to-end -k 10 


Cufflinks 1 


1.3.0 


-min-isoform-fraction 0.05 — multi-rcad-correct -G 


Cufflinks2 


2.0.0 


-min-isoform-fraction 0.05 —multi-rcad-correct -G 


Flux-Capacitor 


1.2.3-20121215021902 




GSNAP 


2012-07-20 


-N 1 -A sam 


HTSeq 


0.5.3p9 


htscq-count -i gcnc_id —mode— (union — intersection-nonempty) —stranded— no 


OSA 


2.0.1 


-alignrna ScarchNovclExon Junction— True 


Smalt 


0.6.4 


-f samsoft 


Star 


2.2.0 


-outFilterMultimapNmax 10 -sjdbOvcrhang 20 -sjdbFilcChrStartEnd 


TopHatl 


1.4.1 


-min-intron-lcngth 6 


TopHat2 


2.0.6 


-no-covcragc-scarch -min-intron-length 6 



Table 1: Aligners and quantification methods: versions and parameters used. 



Mapper 


Splicing 


BWA1 


No 


BWA2 


No 


Bowtiel 


No 


Bowtie2 


No 


GSNAP 


Yes 


OSA 


Yes 


Smalt 


No 


Star 


Yes 


TopHat 1 


Yes 


TopHat 2 


Yes 



Table 2: Mappers: support for splicing 



Dataset 


Species 


Data 


FASTQ 


SE 


PE 


RL 


~|E-MTAB-513 
SRP000225| 


Human 


16 organism parts 


32 


16 


16 


75 & 50 


Human 


2 organism parts 


6 


6 


0 


36 


E-MTAB-599 


Mouse 


organism part (6) 


36 


36 


0 


76 


E-MTAB-387 


E.coli K12 


2 developmental stages 


2 


2 


0 


36 



Table 3: Experimental data sets. 
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Dataset 


SE/PE 


RL 


Depth 


150.dl0.se 


SE 


50 


10 


1100.dl0.se 


SE 


100 


10 


1150.dl0.se 


SE 


150 


10 


1200.dl0.se 


SE 


200 


10 


150.dl0.pe 


PE 


50 


10 


1100.dl0.pe 


PE 


100 


10 


1150.dl0.pe 


PE 


150 


10 


1200.dl0.pe 


PE 


200 


10 


150.d30.se 


SE 


50 


30 


1100.d30.se 


SE 


100 


30 


1150.d30.se 


SE 


150 


30 


1200.d30.se 


SE 


200 


30 


150.d30.pe 


PE 


50 


30 


1100.d30.pe 


PE 


100 


30 


1150.d30.pe 


PE 


150 


30 


1200.d30.pe 


PE 


200 


30 


150.d60.se 


SE 


50 


60 


1100.d60.se 


SE 


100 


60 


1150.d60.se 


SE 


150 


60 


1200.d60.se 


SE 


200 


60 


150.d60.pe 


PE 


50 


60 


1100.d60.pe 


PE 


100 


60 


1150.d60.pe 


PE 


150 


60 


1200.d60.pe 


PE 


200 


60 


150.dl20.se 


SE 


50 


120 


1100.dl20.se 


SE 


100 


120 


1150.dl20.se 


SE 


150 


120 


1200.dl20.se 


SE 


200 


120 


150.dl20.pe 


PE 


50 


120 


1100.dl20.pe 


PE 


100 


120 


1150.dl20.pe 


PE 


150 


120 


1200.dl20.pe 


PE 


200 


120 



Table 4: Synthetic data sets. Each simulated data set is composed by 8 fastq 
files for which the true number of raw counts per gene is known. The SE/PE 
column indicates if the pairing of the reads (SE-single end, PE- paired-end), 
the RL column indicates the read length and Depth the sequencing depth. 
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1x10° 1x10 2 1x10 4 

Multiple pipelines osa x htseq-ine 



• osa x htseq-ine 




log(mean expression) 



Figure 1: Experimental RNA-seq data from Human - SRP000225, A) Spear- 
man correlation distribution between the gene expression profiles inferred by 
different pipelines; B) correlation between two specific pipelines (the respec- 
tive Spearman correlation is shown in plot A as a purple box); C) fold change 
between the gene expression values inferred by the same two pipelines - dots 
in red denote genes where the expression values are significantly different 
between the two selected pipelines (for a false discovery rate of 0.01); D) 
expression values inferred by the two pipelines for the six selected (boxed) 
genes in plot C). 
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Figure 2: Experimental RNA-seq data from mouse - E-MTAB-599, A) Spear- 
man correlation distribution between the gene expression profiles inferred by 
different pipelines; B) correlation between two specific pipelines (the respec- 
tive Spearman correlation is shown in plot A as a purple box); C) fold change 
between the gene expression values inferred by the same two pipelines - dots 
in red denote genes where the expression values are significantly different 
between the two selected pipelines (for a false discovery rate of 0.01); D) 
expression values inferred by the two pipelines for the six selected (boxed) 
genes in plot C). 
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• bwa2 x htseq-u 




0 2 4 6 8 10 
log(mean expression) 



Figure 3: Experimental RNA-seq data from E. coli K12 - |E-MTAB-387l 
A) Spearman correlation distribution between the gene expression profiles 
inferred by different pipelines; B) correlation between two specific pipelines 
(the respective Spearman correlation is shown in plot A as a purple box); 
C) fold change between the gene expression values inferred by the same 
two pipelines - dots in red denote genes where the expression values are 
significantly different between the two selected pipelines (for a false discovery 
rate of 0.01); D) expression values inferred by the two pipelines for the six 
selected (boxed) genes in plot C). 
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Figure 4: Distribution of the error across all data sets and pipelines 
mented by pipelines using spliced and unspliced aligners. 
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Figure 5: Number of genes with high error (> 100%) or low error (< 10%) 
across all data sets and: i) all pipelines; ii) pipelines with spliced aligners; hi) 
pipelines combining OS A or Tophatl with htseq-ine, Cufninks2, and Flux- 
capacitor. 
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Pipelines 




a <n o & □_ 

Read Length Libray Tag Type 



Figure 6: Number of results gathered by pipeline, read length and library 
tag type (SE=single end, PE=Paired-end). 
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Pipeline 


Overall 




Error 




Sp 


carman 


Aligner 


Quant. Method 


Rank 


Rank 


mean ± 


sd 


Rank 


mean ± sd 


os a 


htseq-ine 


20 


9 


16.09 ± 


0.68 


11 


0.93 ± 0.01 


tophatl 


htseq-ine 


23 


11 


16.92 ± 


2.58 


13 


0.93 ± 0.01 


smalt 


htseq-ine 


24 


15 


18.35 ± 


8.24 


9 


0.94 ± 0.00 


osa 


fluxcapacitor 


26 


24 


19.38 ± 


0.87 


2 


0.95 ± 0.00 


tophat2 


htseq-ine 


26 


14 


18.91 ± 


6.8 


12 


0.93 ± 0.01 


star 


fluxcapacitor 


27 


23 


19.12 ± 


0.9 


4 


0.94 ± 0.00 


star 


htseq-ine 


27 


10 


16.84 ± 


2.8 


16 


0.93 ± 0.01 


bwa2 


htseq-ine 


28 


16 


20.34 ± 


6.05 


12 


0.93 ± 0.02 


gsnap 


htseq-ine 


31 


15 


22 ± 10 


23 


17 


0.93 ± 0.01 


tophatl 


fluxcapacitor 


31 


25 


19.54 ± 


0.92 


5 


0.94 ± 0.00 


tophat2 


fluxcapacitor 


33 


27 


19.98 ± 


1.28 


6 


0.94 ± 0.01 


smalt 


htseq-u 


35 


21 


20.98 ± 


9.93 


14 


0.93 ± 0.00 


star 


cufflinks2 


35 


8 


15.65 ± 


0.83 


27 


0.91 ± 0.01 


tophatl 


cufflinks2 


35 


9 


22.03 ± 


20.29 


25 


0.91 ± 0.04 


bwa2 


fluxcapacitor 


36 


26 


20.84 ± 


2.99 


10 


0.94 ± 0.01 


osa 


htseq-u 


36 


16 


22.84 ± 


19.12 


21 


0.92 ± 0.03 


bwa2 


htseq-u 


37 


20 


23.75 ± 


13.71 


17 


0.92 ± 0.02 


gsnap 


fluxcapacitor 


37 


28 


21.92 ± 


8.47 


10 


0.94 ± 0.01 


gsnap 


htseq-u 


38 


15 


21.18 ± 


10.27 


23 


0.92 ± 0.01 


tophatl 


htseq-u 


38 


18 


18.77 ± 


4.73 


20 


0.92 ± 0.00 


star 


htseq-u 


39 


15 


16.99 ± 


2.58 


23 


0.92 ± 0.00 


tophatl 


cufflinksl 


39 


11 


16.09 ± 


0.83 


29 


0.91 ± 0.01 


tophat2 


htseq-u 


39 


19 


20.66 ± 


8.61 


20 


0.92 ± 0.01 


bwa2 


cufflinks2 


41 


21 


21.92 ± 


7.76 


20 


0.92 ± 0.02 


osa 


cufflinks2 


41 


14 


20.51 ± 


9.56 


27 


0.91 ± 0.03 


bwal 


htseq-ine 


44 


23 


24.58 ± 


8.97 


21 


0.91 ± 0.03 


osa 


cufflinksl 


44 


15 


18.31 ± 


6.03 


30 


0.91 ± 0.01 


star 


cufflinksl 


44 


13 


17.35 ± 


4.49 


31 


0.91 ± 0.01 


tophat2 


cufflinks2 


44 


16 


27.4 ± 25.32 


29 


0.88 ± 0.11 


gsnap 


cufflinks2 


45 


17 


23.55 ± 


16.98 


27 


0.9 ± 0.05 


smalt 


fluxcapacitor 


46 


28 


22.97 ± 


7.7 


18 


0.93 ± 0.01 


bwal 


htseq-u 


49 


25 


25.64 ± 


9.59 


24 


0.9 ± 0.03 


bwal 


fluxcapacitor 


50 


31 


26.57 ± 


8.65 


20 


0.91 ± 0.04 




cufflinksl 


52 


20 


24.56 ± 


15.57 


32 


0.91 ± 0.01 


smalt 


cufflinks2 


52 


27 


28.88 ± 


21.21 


25 


0.9 ± 0.06 


tophat2 


cufflinksl 


52 


20 


27.16 ± 


19.91 


32 


0.91 ± 0.01 


bwal 


cufflinks2 


54 


26 


33.52 ± 


23.29 


28 


0.87 ± 0.09 


smalt 


cufflinksl 


54 


28 


29.34 ± 


21.16 


26 


0.9 ± 0.08 


bwa2 


cufflinksl 


58 


28 


33.82 ± 


22.89 


30 


0.86 ± 0.08 


bwal 


cufflinksl 


59 


28 


38.15 ± 


29.03 


31 


0.85 ± 0.11 


bowtie2 


htseq-ine 


61 


26 


23.11 ± 


9.45 


35 


0.88 ± 0.01 


bowtiel 


fluxcapacitor 


64 


34 


28.07 ± 


7.37 


30 


0.89 ± 0.04 


bowtie2 


htseq-u 


66 


30 


25.96 ± 


11.25 


36 


0.87 ± 0.01 


bowtie2 


fluxcapacitor 


72 


35 


30.24 ± 


6.95 


37 


0.85 ± 0.04 


bowtie2 


cufflinksl 


80 


38 


32.99 ± 


12.91 


43 


0.81 ± 0.03 


bowtie2 


cufflinks2 


80 


39 


39.63 ± 


19.74 


41 


0.83 ± 0.03 



Table 5: Average rankings of the pipelines across the data sets with single- 
end reads. The overall rank was obtained by summing the rankings on each 
metric. The average value and standard deviation accross datasets is also 
shown for each metric. The table is sorted by overall rank (top corresponds 
to lowest rank values). 
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Pipeline 


Overall 




Error 


Sp 


carman 


Aligner 


Ouant. Method 


Rank 


Rank 


mean ± sd 


Rank 


mean ± sd 


tophatl 


htseq-ine 


12 


9 


17.65 ± 2.5 


3 


0.94 ± 0.00 


gsnap 


htseq-ine 


15 


8 


17.59 ± 2.61 


7 


0.94 ± 0.00 


osa 


htseq-ine 


15 


9 


17.68 ± 2.55 


6 


0.94 ± 0.00 


tophat2 


htseq-ine 


19 


11 


19.31 ± 5.07 


8 


0.94 ± 0.00 


star 


htseq-ine 


21 


11 


17.99 ± 2.8 


10 


0.94 ± 0.00 


osa 


fluxcapacitor 


23 


21 


20.51 ± 2.67 


2 


0.95 ± 0.00 


tophatl 


fluxcapacitor 


23 


20 


20.04 ± 2.82 


3 


0.95 ± 0.00 


osa 


cufflinks2 


27 


12 


21.14 ± 9.91 


15 


0.93 ± 0.01 


smalt 


htseq-ine 


27 


12 


19.55 ± 6.19 


14 


0.93 ± 0.00 


tophatl 


cufflinks2 


29 


15 


23.25 ± 13.86 


14 


0.93 ± 0.01 


star 


fluxcapacitor 


31 


24 


21.9 ± 3.22 


7 


0.94 ± 0.00 


gsnap 


cufflinks2 


32 


14 


24.72 ± 18.34 


18 


0.92 ± 0.04 


osa 


cufflinksl 


33 


11 


23.18 ± 15.1 


22 


0.92 ± 0.01 


star 


cufflinks2 


33 


15 


18.15 ± 3.69 


18 


0.93 ± 0.01 


bwal 


htseq-ine 


34 


19 


23.05 ± 7.14 


14 


0.92 ± 0.02 


star 


cufflinksl 


34 


10 


17.98 ± 4.16 


24 


0.92 ± 0.01 


tophatl 


cufflinksl 


34 


16 


26.65 ± 19.29 


18 


0.92 ± 0.01 


tophatl 


htseq-u 


34 


17 


21.06 ± 4.81 


17 


0.93 ± 0.00 


tophat2 


cufflinks2 


35 


13 


20.65 ± 7.89 


22 


0.92 ± 0.01 


gsnap 


fluxcapacitor 


36 


24 


23.56 ± 3.21 


12 


0.93 ± 0.00 


osa 


htseq-u 


36 


16 


21.13 ± 8.71 


20 


0.92 ± 0.00 


tophat2 


fluxcapacitor 


36 


24 


24.36 ± 5.37 


11 


0.93 ± 0.01 


gsnap 


cufflinksl 


38 


14 


26.06 ± 18.71 


24 


0.9 ± 0.05 


smalt 


htseq-u 


41 


16 


19.46 ± 3.48 


25 


0.92 ± 0.00 


bwa2 


cufflinks2 


42 


23 


29.01 ± 19.91 


19 


0.91 ± 0.07 


gsnap 


htseq-u 


42 


18 


26.48 ± 16.79 


24 


0.92 ± 0.01 


tophat2 


htseq-u 


42 


19 


22.34 ± 6.28 


22 


0.92 ± 0.00 


star 


htseq-u 


45 


20 


20.18 ± 5.97 


25 


0.92 ± 0.00 


tophat2 


cufflinksl 


45 


16 


27.62 ± 23.49 


29 


0.89 ± 0.09 


bwal 


htseq-u 


47 


27 


26.72 ± 7.57 


20 


0.92 ± 0.02 


bwa2 


cufflinksl 


47 


24 


34.07 ± 22.23 


23 


0.89 ± 0.08 


smalt 


cufflinks2 


48 


21 


21.87 ± 7.75 


27 


0.91 ± 0.02 


smalt 


cufflinksl 


50 


21 


23.94 ± 10.8 


29 


0.91 ± 0.02 


bwal 


cufflinks2 


51 


27 


29.56 ± 12.05 


25 


0.89 ± 0.05 


bwal 


cufflinksl 


52 


26 


34.92 ± 17.92 


26 


0.88 ± 0.05 


bwal 


fluxcapacitor 


58 


32 


35.37 ± 11.82 


26 


0.88 ± 0.05 


smalt 


fluxcapacitor 


58 


28 


30.21 ± 8.77 


30 


0.89 ± 0.03 


bowtie2 


htseq-ine 


67 


32 


37.74 ± 11.35 


35 


0.84 ± 0.04 


bowtiel 


fluxcapacitor 


69 


35 


39.95 ± 11.73 


34 


0.85 ± 0.03 


bowtie2 


htseq-u 


71 


34 


41.58 ± 14.96 


37 


0.83 ± 0.04 


bowtie2 


cufflinks2 


73 


35 


46.67 ± 22.7 


38 


0.8 ± 0.09 


bowtie2 


fluxcapacitor 


73 


35 


39.02 ± 7.66 


38 


0.81 ± 0.05 


bowtie2 


cufflinksl 


74 


35 


45.36 ± 20.72 


39 


0.8 ± 0.06 



Table 6: Average rankings of the pipelines across the data sets with paired- 
end reads. The overall rank was obtained by summing the rankings on each 
metric. The average value and standard deviation accross datasets is shown 
for each metric between brackets. The table is sorted by overall rank (top 
corresponds to lowest rank values). 
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Estimated number of reads 



True number of reads 



A) 




B) 



1 x icr 1 x icr 1 x io" 

Reads/Sum genes's transcript length/Depth 



tophatl x htseq-ine 
osa x htseq-ine 
tophatl x fluxcapacitor 
osa x fluxcapacitor 
tophatl x cufflinks2 
osa x cufflinks2 




1 1 

1 x 10" 5 1 x 10" 3 1 x 10"' 

Reads/Sum genes's transcript length/Depth 



Figure 7: Error by number of reads (normalized per gene using the sum of 
the transcript lengths of a gene and and sequencing depth of the data set) for 
multiple pipelines and 16 data sets (single-end). The lines shown are lowess 
regressions of the errors per gene and data set. A - number of reads per gene 
used was inferred by the pipeline; B - the number of reads used corresponds 
to the true number of reads per gene. 
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Figure 8: Percentage of the gene length "explained" by exons with a length 
shorter than 200 nucleotides. 
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I I I 

Other Low error High error 



Figure 9: Gene sequence uniqueness: where N is the number of locations 
in the genome similar to the gene's sequence. 
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Figure 10: Number of genes with a positive or negative error across all data 
sets and pipelines. 
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Enscmbl ID 


Name 


GC % 


Chr 


N. Trans. 


Length 


JZjIN O \j \.fy}\J\J\J 1UU ^ -J O 




43.40 


10 




2814 


ENSG00000089006 


^r^xJ > ^sorting^c^n C 5 Pr0m ° tmS COmpl ° X subumt 15 


42. 13 


20 


1 1 


4973 


ENSG00000101294 


H1VI13 - histocompatibility (minor) 13 


49.35 


20 


5 


5928 


ENSG00000171863 


RPS7 - ribosomal protein S7 


44.76 


2 


4 


5581 


ENSG00000163541 


S UC L G 1 - succinate- Co A. ligasc alpha subunit 


39.25 


2 


5 


4273 


ENSG000001 77082 


WDR73 - WD repeat domain 73 


49.15 


15 


4 


6135 


ENSG00000140553 


UNC45A - unc-45 homolog A (C. elegans) 


53.82 


15 


4 


7930 


ENSG00000082068 


WDR70 - WD repeat domain 70 


39 




2 


7021 


ENSG00000197375 


catkm /carnitincT trans o^terT member ^ '° rgamC 


48.32 


5 


9 


12778 














ENSG00000011485 


P^P^C^Totcm r hos P hatasc S^cataT tic subunit 


50.75 


19 


20 


7015 


ENSG00000213930 


GALT "galeae tosc P l pl^sph^c^uridy^ 


50.16 


9 


9 


4359 


ENSG()()()()0213213 


KIAA1984 - KIAA1984 


59.33 


9 


2 


6065 


FN^nnnnoni Q7fi7n 




60.84 


9 


3 


3816 


ENSG0000016S676 


KCTD19 - potassium channel tctramcrisation domain 


47.44 


16 


5 


7366 




containing 19 










ENSG00000103187 


C C*TL 1 - co act os in- like 1 ( D ictyostclium) 


50.24 


16 


2 


8839 






40.07 




Q 


7532 




tcin A2/B1 










ENSG00000106258 


CYP3A5 - cytochrome P450. family 3. subfamily A, 


40.47 


7 


5 


6717 














ENSG00000105971 


CAV2 - cavcolin 2 


37.86 


7 


3 


6633 


ENSG00000154438 




35.27 


7 


11 


3052 




d\fnlLn a orltaining ^ l at, baS1C lcucmc zl PP cr 










ENSG00000196329 


GIMAP5 - GTPasc, IMAP family member 5 


43.79 


7 


1 


6026 


ENSG00000198912 


Clorfl74 - chromosome 1 open reading frame 174 


49.32 


1 


1 


4384 


ENSG00000142920 


ADC - argininc decarboxylase 


45.93 


1 


20 


6550 


ENSG00000116898 


MRPS15 - mitochondrial ribosomal protein S15 


49.28 


1 


1 


2908 


ENSG00000159214 


CCDC24 - coilcd-coil domain containing 24 


57.95 


1 


1 


3548 


ENSG00000126088 


UROD - uroporphyrinogen decarboxylase 


52.29 


1 


3 


2905 


ENSG00000117481 


NSUN4 - NOP2/Sun domain family, member 4 


47.35 


1 


4 


8461 


ENSG00000187889 


Clorfl68 - chromosome 1 open reading frame 168 


38.48 


1 


1 


4611 


ENSG00000203965 


EFCAB7 - EF-hand calcium binding domain 7 


35.15 


1 


1 


5688 


ENSG00000125462 


Clorf61 - chromosome 1 open reading frame 61 


51.53 


1 


5 


8378 


ENSG00000127074 


RGS13 - regulator of G-protcin signaling 13 


35.75 


1 


2 


5864 


ENSG00000159176 


CSRP1 - cysteine and glycine-rich protein 1 


50.68 


1 


7 


10311 


ENSG00000134548 


C12orf39 - chromosome 12 open reading frame 39 


38.50 


12 


2 


2925 


ENSG00000111786 


SRSF9 - scrinc/argininc-rich splicing factor 9 


46.53 


12 


3 


3740 


ENSG00000204348 


DOM3Z - dom-3 homolog Z (C. elegans) 


58.72 


6 


5 


2482 


ENSG00000114857 


NKTR. - natural killer-tumor recognition sequence 


37.54 


3 


4 


17336 


ENSG00000237765 


FAM200B - family with sequence similarity 200, mem- 


40.86 


4 


3 


4812 


ENSG00000157379 


ber B 

DHRS1 - dehydrogenase/reductase (SDR family) 


48.04 


14 


6 


4994 




member 1 










ENSG00000054690 


PLEKHH1 - plcckstrin homology domain containing, 


48.02 


14 


14 


10788 




family H (with MyTH4 domain) member 1 










ENSG00000185189 


NRBP2 - nuclear receptor binding protein 2 


62.45 


8 


6 


4921 


ENSG00000133812 


SBF2 - SET binding factor 2 


38.31 


11 


18 


16722 


ENSG00000109920 


FNBP4 - formin binding protein 4 


43.43 


11 


2 


7581 


ENSG00000187066 


AP003068.6.1 


55.56 


11 


3 


4227 


ENSG00000149294 


NCAM1 - neural cell adhesion molecule 1 


41.71 


11 


41 


12734 



Table 7: Genes with consistent high error (greater than 100%) across most 
pipelines and data sets: Ensembl gene ID; Gene name; percentage of GC- 
content; location (Chromossome); number of transcripts; gene length (sum 
of the length of the exons). 
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Enscmbl ID 



GO term 



ENSG00000166295 
ENSG00000089006 



ENSG00000101294 



ENSG00000171863 



ENSG()()()0()lfi»r>4I 



ENSG00000140553 
ENSG00000197375 



ENSG()()()00011485 



ENSGO()()0()2i:59:i() 



ENSG00000168676 
ENSG00000103187 



ENSG()(KK)()I225fi(i 



ENSG0()(H)010G2r>8 



ENSG()(KK)0105971 



ENSG(H)()()()ir>4438 



protein ubiquitination;protcin ubiquitination;mitosis;ccll division;cytoplasm;anaphasc-promoting complex 
pinocytosis;ccll communication; protein transport; ruffle; phagocytic cup cytoplasmic vesicle mem- 
brane; extrinsic to internal side of plasma membrane; extrinsic to endosome membrane; early endosome 
mcmbranc;macropinocytic cup;phosphatidylinositol binding;phosphatidylinositol binding 

membrane protein protcolysis;plasma mcmbranc;cndoplasmic rcticulum;rough endoplasmic rcticulum;ccll 
surfacc;intcgral to cytosolic side of endoplasmic reticulum mcmbranc;intcgral to lumcnal side of endoplas- 
mic reticulum mcmbranc;protcin binding;pcptidasc activity;aspartic endopeptidase activity, intramcmbranc 
clcaving;protcin homodimcrization activity 

nuclear-transcribed mRNA catabolic process, nonsense-mediated dccay;rRNA process- 
ing; translation: translation; translation; trans lational initiation; translational elongation: translational 
tcrmination;SRP-dcpcndcnt cotranslational protein targeting to mcmbranc;viral rcproduction;gcnc cxprcs- 
sion;RNA metabolic proccss;mRNA metabolic proccss;viral infectious cyclc;viral transcription;ribosomal 
small subunit biogcncsis;ccllular protein metabolic proccss;cytosolic small ribosomal subunit;cytosolic small 
ribosomal subunit; rib onuclcoprotcin complex ;cytosol;ribosomc; nucleus; nucleolus; microtubule organizing 
center;90S prcribosomc;small-subunit processome;protein binding;RNA binding;structural constituent of 
ribosomc 

tricarboxylic acid cycle tricarboxylic acid cyclc;succinyl-CoA metabolic process; succinate metabolic 
process; small molecule metabolic process; plasma membrane; mitochondrion; cytoplasm; mitochondrial in- 
ner mcmbranc;mitochondrial matrix;succinatc-CoA ligasc complex (GDP-forming) ; ATP citrate synthase 
activity; succinate- Co A ligasc (ADP-formingJ activity; succinate- Co A ligasc (GDP-forming) activity; G TP 
binding;GDP binding;protcin hctcrodimcrization activity;cofactor binding 

muscle organ development: cell different iat ion; chap cronc-mcdiatcd protein folding; nucleus perinuclear re- 
gion of cytoplasm;Hsp90 protein binding 

sodium ion transport; drug transmembrane transport; quaternary ammonium group trans- 
port; carnitine transport; carnitine transport; drug transport: quorum sensing involved in interaction with 
host transmembrane transport;positivc regulation of intestinal epithelial structure maintenance;sodium- 
dependent organic cation transport;plasma mcmbranc;plasma mcmbranc;intcgral to mcmbranc:basolateral 
plasma membrane: apical plasma membrane; brush border membrane; brush border membrane; protein 
binding; ATP binding; carnitine transporter activity; carnitine transporter activity; drug transmembrane 
transporter activity;symportcr activity; quaternary ammonium group transmembrane transporter activ- 
ity;PDZ domain binding;antibiotic transporter activity 

signal transduction ;transcription, DNA-dcpcndcnt; protein dcphosphorylat ion; mitosis; positive regulation 
of I-kappaB kinasc/NF-kappaB cascadc;rcsponsc to morphinc;cytosol;nuclcus;cytoplasm;Golgi appara- 
tus;neuron projcction;neuronal cell body;protcin binding;protein serine/threonine phosphatase activ- 
ity;signal transducer activity;mctal ion binding;idcntical protein binding 

carbohydrate metabolic pro cess; galactose metabolic process: UDP-glucosc catabolic process; galactose 
catabolic process; small molecule metabolic pro cess ;cytosol:Golgi apparat us; UDP-glucosc :hcxosc-l- 
phosphatc uridylyltransfcrasc activity;zinc ion binding 
protein homooligomcrization 

defense response to fungus ;biological_proccss; cellular .component : cytoplasm ;cytoskclcton; protein bind- 
ing;actin binding;cnzymc binding 

nuclear mRNA splicing, via spliccosomc; nuclear mRNA splicing, via splicco- 

somc;mR.NA processing; R.N A splicing; gene expression; R.N A transport ;ribonuclcoprotcin com- 
plex; nucleus; cytoplasm; nucleoplasm ;spliccosomal complex; nucleolus; heterogeneous nuclear ribonuclcopro- 
tein complcx;catalytic step 2 spliccosomc;nuclcotidc binding;protcin binding;RNA binding;singlc-strandcd 
telomcric DNA binding 

xenobiotic metabolic process; steroid metabolic process; alkaloid catabolic process; drug catabolic pro- 
ccss;small molecule metabolic proccss;oxidativc dcmcthylation;cndoplasmic reticulum mcmbranc;clcctron 
carrier activity;monooxygcnasc activity;oxidorcductasc activity; oxygen binding; heme binding ;aromatasc 
activity 

negative regulation of endothelial cell prolifcration;vcsiclc fusion;mitochondrion organization;cndoplasmic 
reticulum organization; regulation of mitosis; synaptic transmission; vesicle organization; positive reg- 
ulation of dopamine receptor signaling pathway; vesicle docking; skeletal muscle fiber dcvclop- 
mcnt;protcin oligomcrization;cavcola asscmbly;plasma mcmbranc;Golgi mcmbranc;intraccllular;acrosomal 
membrane ;cytosol; integral to plasma mcmbr anc; nucleus ;Golgi apparat us transport vesicle; lipid par- 
ticle ;cavcola; cell surface; extrinsic to internal side of plasma membrane: protein complex; membrane 
raft;pcrinuclcar region of cytoplasm;protcin binding;syntaxin binding;Dl dopamine receptor bind- 
ing;protcin homodimcrization activity;phosphoprotcin binding 

signal transduction; male mcios is; multicellular organismal development; spermatogenesis; cell differen- 
tiation; gene silencing by RNA;piRNA metabolic process ;DN A mcthylation involved in gamete 
gcncration;cytoplasm;pi-body;signal transducer activity 



Table 8: GO terms of the genes with consistent high error (greater than 
100%) across most pipelines and data sets (part 1/2). 
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Enscmbl ID 



GO term 



ENSG(H)(H)0196329 



ENSG00000198912 
ENSG00000142920 



ENSG00000116898 
ENSG00000126088 



ENSG00000117481 
ENSG00000203965 
ENSG00000125462 
ENSG00000127074 



ENSG00000159176 
ENSG00000134548 
ENSG00000111786 



ENSG00000204348 
ENSG00000114857 
ENSG00000237765 
ENSG00000157379 
ENSG00000054690 
ENSG00000185189 

ENSG00000133812 



ENSG()(KK)014')2f)4 



temperature homcostasis;positivc regulation of natural killer cell cytokine production;positivc regulation 
of humoral immune response mediated by circulating immunoglobulin;positivc regulation of calcium ion 
transport into cytosol;T cell diffcrcntiation;ncgativc regulation of intcrfcron-gamma production;positivc 
regulation of CD4-positivc, CD25-positivc, alpha-beta regulatory T cell diffcrcntiation;mycloid dendritic 
cell diffcrcntiation:T cell homcostasis;ncgativc regulation of apoptotic proccss;ncgativc regulation of nitric 
oxide biosynthctic proccss;positivc regulation of gamma-delta T cell diffcrcntiation;positivc regulation of 
membrane potcntial;positivc regulation of natural killer cell mediated cytotoxicity;rcgulation of mitochon- 
drial membrane pcrmcability;ncgativc regulation of T cell activation;ncgativc regulation of lipid catabolic 
proccss;intcgral to mcmbranc;lysosomc:mitochondrial outer mcmbranc;GTP binding 
nucleus 

ornithine metabolic process; polyaminc metabolic process ;polyaminc biosynthctic pro- 
cess; spermatogenesis; cellular nitrogen compound metabolic process; small molecule metabolic pro- 
ccss;agmatinc biosynthctic proccss;mitochondrion;cytosol;argininc decarboxylase activity 

translation;mitochondrion;mitochondrial small ribosomal subunit;nuclcar mcmbranc;structural constituent 
of ribosome 

liver dcvclopmcnt;porphyrin-containing compound metabolic proccss;protoporphyrinogcn IX biosyn- 
thetic process; heme biosynthctic pro cess; heme biosynthctic process; response to iron ion; response 
to organic cyclic compound; response to amine stimulus; response to mercury ion; response to estra- 
diol stimulus; small molecule metabolic process; response to ethanol; uroporphyrinogen III metabolic 
process; response to met hylmcrcury; response to fungicide; cellular response to arsenic-containing 
substance ;cytosol; nucleus; cytoplasm; microtubule cytoskclcton; uroporphyrinogen decarboxylase activ- 
ity;uroporphyrinogcn decarboxylase activity;fcrrous iron binding 
mitochondrial large ribosomal subunit;mcthyltransfcrasc activity 
calcium ion binding 
nucleus 

G-protein coupled receptor signaling pathway;tcrmination of G-protcin coupled receptor signaling path- 
way;positivc regulation of GTPasc activity;plasma mcmbranc;cytosol;nuclcus;cytoplasm;GTPasc activator 
activity 

nuclcus;zinc ion binding 

extracellular rcgion;nuclcus;intraccllular membrane-bounded organcllc;transport vesicle 

nuclear mRNA splicing, via spliccosomc;transcription from RNA polymerase II promotcr;tcrmination of 
RNA polymerase II transcription; mRNA splice site selection; mRNA processing; mRNA export from nu- 
clcus;R.NA splicing;gcnc cxprcssion;mRNA 3-cnd proccssing;ncgativc regulation of nuclear mRNA splicing, 
via spliceosomc;nuclcoplasm;nuclcotidc binding;RNA binding 
nucleotide binding;mctal ion binding 

protein folding;mcmbranc;pcptidyl-prolyl cis-trans isomcrasc activity;cyclosporin A binding 
nucleic acid binding 

endoplasmic rcticulum;mitochondrial inner mcmbranc;nuclcotidc binding;oxidorcductasc activity 
cytoskclcton; phospholipid binding 

negative regulation of macroautophagy;ncuron diffcrcntiation;ncgativc regulation of neuron apoptotic pro- 
cess; cytoplasm 

myclination; protein tetramcrization; membrane; vacuolar membrane; protein binding; phosphatase activ- 
ity; phosphatase regulator activity phosphatase binding; phosphatidylinositol binding; protein homodimcr- 
ization activity 

cell adhesion; axon guidance; cytokine- mediated signaling pathway;homotypic cell- cell adhesion; positive 
regulation of calcium- mediated signaling; intcrfcron-gamma- mediated signaling pathway; plasma mem- 
branc;Golgi mcmbranc;intcgral to mcmbranc;cxtraccllular rcgion;cxtcrnal side of plasma mcmbrane;cell 
surfacc;anchorcd to mcmbranc;axon;ncuronal cell body 



Table 9: GO terms for the genes with consistent high error (greater than 
100%) across most pipelines and data sets (part 2/2). 
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